#> [1] "hello"
1 Introduction
- Overview and Motivation
- Related Work
- Research questions
2 TESTING if R works and if Python works
#> 30.0
3 Data
- Sources
- Description
- Wrangling/cleaning
- Spotting mistakes and missing data (could be part of EDA too)
- Listing anomalies and outliers (could be part of EDA too)
3.1 Loading and small cleaning (not complete for now)
3.2 Change the path below
3.3 Loading and small cleaning (not complete for now)
3.4 Histogram of prices
3.5 Histogram of prices for each property type
note : only price between 0 and 500000 so some outliers aren’t here
3.6 Histogram of prices for each year category
note : only price between 0 and 500000 so some outliers aren’t here
3.7 Histogram of prices for each canton
note : only price between 0 and 500000 so some outliers aren’t here
3.8 Histogram of prices for each number of rooms
note : only price between 0 and 500000 so some outliers aren’t here
and the graph below only show apartments with less than 10 rooms (but you can change the code if needed
3.9 Test Regression
#>
#> Call:
#> lm(formula = price ~ number_of_rooms + canton + property_type +
#> year_category, data = properties)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -7013788 -514438 -138948 264464 21628996
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -677158 55739 -12.15 < 2e-16
#> number_of_rooms 337946 6166 54.81 < 2e-16
#> cantonappenzell-ausser-rhoden -464945 126861 -3.66 0.00025
#> cantonappenzell-inner-rhoden -874289 392590 -2.23 0.02596
#> cantonbasel-landschaft -195701 57943 -3.38 0.00073
#> cantonbasel-stadt 218682 105130 2.08 0.03753
#> cantonbern -478376 46221 -10.35 < 2e-16
#> cantonfribourg -781416 48366 -16.16 < 2e-16
#> cantongeneva 2025260 62234 32.54 < 2e-16
#> cantonglarus -573694 173301 -3.31 0.00093
#> cantongrisons 59982 71666 0.84 0.40262
#> cantonjura -801519 77323 -10.37 < 2e-16
#> cantonlucerne -187979 73261 -2.57 0.01030
#> cantonneuchatel -353635 65590 -5.39 7.1e-08
#> cantonnidwalden 991055 244826 4.05 5.2e-05
#> cantonobwalden 366062 244712 1.50 0.13470
#> cantonschaffhausen -584997 120601 -4.85 1.2e-06
#> cantonschwyz 18070 132558 0.14 0.89157
#> cantonsolothurn -784557 61024 -12.86 < 2e-16
#> cantonst-gallen -404890 55918 -7.24 4.6e-13
#> cantonthurgau -37337 63444 -0.59 0.55620
#> cantonticino 125913 38499 3.27 0.00108
#> cantonuri 9578 155772 0.06 0.95097
#> cantonvalais -219964 39781 -5.53 3.3e-08
#> cantonvaud 89914 40258 2.23 0.02553
#> cantonzug 801241 153896 5.21 1.9e-07
#> cantonzurich 316099 49688 6.36 2.0e-10
#> property_typeAttic flat 311019 45964 6.77 1.4e-11
#> property_typeBifamiliar house 41841 42939 0.97 0.32986
#> property_typeChalet 1136804 56690 20.05 < 2e-16
#> property_typeDuplex -5091 56699 -0.09 0.92846
#> property_typeFarm house 237939 118848 2.00 0.04529
#> property_typeLoft 285442 291977 0.98 0.32827
#> property_typeRoof flat 4801 64587 0.07 0.94074
#> property_typeRustic house -281265 249068 -1.13 0.25880
#> property_typeSingle house 389066 24252 16.04 < 2e-16
#> property_typeTerrace flat 88662 87071 1.02 0.30856
#> property_typeVilla 1278283 38187 33.47 < 2e-16
#> year_category1919-1945 10462 61602 0.17 0.86515
#> year_category1946-1960 76025 57261 1.33 0.18429
#> year_category1961-1970 232055 48444 4.79 1.7e-06
#> year_category1971-1980 210609 43422 4.85 1.2e-06
#> year_category1981-1990 237789 43679 5.44 5.3e-08
#> year_category1991-2000 477554 45385 10.52 < 2e-16
#> year_category2001-2005 519338 55369 9.38 < 2e-16
#> year_category2006-2010 591351 48030 12.31 < 2e-16
#> year_category2011-2015 724194 47219 15.34 < 2e-16
#> year_category2016-2024 641233 36926 17.37 < 2e-16
#>
#> (Intercept) ***
#> number_of_rooms ***
#> cantonappenzell-ausser-rhoden ***
#> cantonappenzell-inner-rhoden *
#> cantonbasel-landschaft ***
#> cantonbasel-stadt *
#> cantonbern ***
#> cantonfribourg ***
#> cantongeneva ***
#> cantonglarus ***
#> cantongrisons
#> cantonjura ***
#> cantonlucerne *
#> cantonneuchatel ***
#> cantonnidwalden ***
#> cantonobwalden
#> cantonschaffhausen ***
#> cantonschwyz
#> cantonsolothurn ***
#> cantonst-gallen ***
#> cantonthurgau
#> cantonticino **
#> cantonuri
#> cantonvalais ***
#> cantonvaud *
#> cantonzug ***
#> cantonzurich ***
#> property_typeAttic flat ***
#> property_typeBifamiliar house
#> property_typeChalet ***
#> property_typeDuplex
#> property_typeFarm house *
#> property_typeLoft
#> property_typeRoof flat
#> property_typeRustic house
#> property_typeSingle house ***
#> property_typeTerrace flat
#> property_typeVilla ***
#> year_category1919-1945
#> year_category1946-1960
#> year_category1961-1970 ***
#> year_category1971-1980 ***
#> year_category1981-1990 ***
#> year_category1991-2000 ***
#> year_category2001-2005 ***
#> year_category2006-2010 ***
#> year_category2011-2015 ***
#> year_category2016-2024 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1240000 on 21363 degrees of freedom
#> (72 observations deleted due to missingness)
#> Multiple R-squared: 0.323, Adjusted R-squared: 0.321
#> F-statistic: 216 on 47 and 21363 DF, p-value: <2e-16
4 Supervised learning
- Data splitting (if a training/test set split is enough for the global analysis, at least one CV or bootstrap must be used)
- Two or more models
- Two or more scores
- Tuning of one or more hyperparameters per model
- Interpretation of the model(s)
5 Unsupervised learning
- Clustering and/or dimension reduction
6 Conclusion
- Brief summary of the project
- Take home message
- Limitations
- Future work?